不同的语言表达方式可以通过强调某些参与者而不是其他观点来概念化同一事件。在这里,我们调查了一种具有社会后果的案例:基于性别的暴力(GBV)的语言表达如何影响我们认为谁负责?我们基于该领域的先前心理语言研究,并对从意大利报纸的语料库自动提取的GBV描述进行了大规模的感知调查。然后,我们训练回归模型,以预测GBV参与者在感知到的责任的不同方面的显着性。我们的最佳模型(微调的BERT)显示出稳定的整体性能,并且在维度和参与者之间存在较大差异:显着_focus_比Sartient _blame_更可预测,而肇事者的显着性比受害者的显着性更为可预测。使用不同表示的脊回归模型进行的实验表明,基于语言理论的特征与基于单词的特征类似。总体而言,我们表明,不同的语言选择确实触发了对责任感的不同看法,并且可以自动建模这种看法。这项工作可能是提高公众和新闻制作人不同观点后果的认识的核心工具。
translated by 谷歌翻译
象征性语言生成是在所需的言语中重新设计给定文本的任务,同时仍然忠于原始上下文。我们通过为自动生成五种英语中的五种常见形式形式提供基准,迈出了迈向多位数语言建模的第一步。我们训练MFLAG采用一种在BART顶部预训练的多基因语言的方案,以及将目标象征性信息注入编码器的机制;这使得具有目标形式形式的文本从另一种比喻形式产生,而没有平行的形象构句。我们的方法表现优于所有强大的基线。我们还提供了一些定性分析和对不同语音数字之间关系的反思。
translated by 谷歌翻译
Recent video+language datasets cover domains where the interaction is highly structured, such as instructional videos, or where the interaction is scripted, such as TV shows. Both of these properties can lead to spurious cues to be exploited by models rather than learning to ground language. In this paper, we present GrOunded footbAlL commentaries (GOAL), a novel dataset of football (or `soccer') highlights videos with transcribed live commentaries in English. As the course of a game is unpredictable, so are commentaries, which makes them a unique resource to investigate dynamic language grounding. We also provide state-of-the-art baselines for the following tasks: frame reordering, moment retrieval, live commentary retrieval and play-by-play live commentary generation. Results show that SOTA models perform reasonably well in most tasks. We discuss the implications of these results and suggest new tasks for which GOAL can be used. Our codebase is available at: https://gitlab.com/grounded-sport-convai/goal-baselines.
translated by 谷歌翻译
Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in contexts where aggregate information is released about a database containing sensitive information about individuals.Our goal is a broad understanding of the resources required for private learning in terms of samples, computation time, and interaction. We demonstrate that, ignoring computational constraints, it is possible to privately agnostically learn any concept class using a sample size approximately logarithmic in the cardinality of the concept class. Therefore, almost anything learnable is learnable privately: specifically, if a concept class is learnable by a (non-private) algorithm with polynomial sample complexity and output size, then it can be learned privately using a polynomial number of samples. We also present a computationally efficient private PAC learner for the class of parity functions. This result dispels the similarity between learning with noise and private learning (both must be robust to small changes in inputs), since parity is thought to be very hard to learn given random classification noise.Local (or randomized response) algorithms are a practical class of private algorithms that have received extensive investigation. We provide a precise characterization of local private learning algorithms. We show that a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model. Therefore, for local private learning algorithms, the similarity to learning with noise is stronger: local learning is equivalent to SQ learning, and SQ algorithms include most known noise-tolerant learning algorithms. Finally, we present a separation between the power of interactive and noninteractive local learning algorithms. Because of the equivalence to SQ learning, this result also separates adaptive and nonadaptive SQ learning.
translated by 谷歌翻译